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Sir: 

I, Victor V. Levenson, M.D., Ph.D., being duly warned, hereby declare and state that: 

1 . I am a co-inventor of the invention disclosed and presently claimed in the above-captioned 
patent application. 

2. The attached document entitled "Grant Proposal" (EXHIBIT A) describes experiments that 
were performed utilizing methods disclosed and presently claimed in the above-captioned patent 
application. The Grant Proposal was submitted to the Susan Koman Foundation and was not funded. 
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3. The Grant Proposal discloses experiments in which breast cancer was diagnosed or 
characterized in patients by detecting methylation in DAPK1 and additional genes that together were 
utilized as a composite "biomarker." (See Grant Proposal at page 4.) 

4. As discussed in the Grant Proposal, plasma samples from twenty-nine (29) normal patients or 
plasma samples from twenty-nine (29) patients with ductal carcinoma in situ (DCIS) were obtained. 
Genomic DNA was isolated from the samples, and the methylation status of DAPK and additional 
genes including FAS, MCTS1, CDKN2A, PAX5, PGK1, RPL15, THBS, TNFSF1 1, and VHL was 
assessed utilizing the methylation assay disclosed in the above-captioned patent application. 
Approximately 89.3% of DCIS samples were found to exhibit methylation in the DAPK1 gene while 
only 5 1 .9% of normal samples were found to exhibit methylation in the DAPK gene. Utilizing 
statistical methods disclosed in the Grant Proposal, the biomarker comprising DAPK1 and additional 
genes was shown to identify DCIS with approximately 84% sensitivity and approximately 80% 
specificity. (See Grant Proposal at page 4, Table 4 and at pages 7-8, under heading "Statistical 
analysis".) 

5. Also as discussed in the Grant Proposal, plasma samples from three (3) healthy patients or 
plasma samples from three (3) patients with atypical ductal hyperplasia (ADH) also were obtained. 
Genomic DNA was isolated from the samples and the methylation status of DAPK and additional 
genes was assessed utilizing the methylation assay disclosed in the above-captioned patent 
application. (See Grant Proposal at page 4, Table 5, and accompanying text.) Approximately 37.5% 
of ADH samples were found to exhibit methylation in the DAPK1 gene, while no healthy samples 
were found to exhibit methylation in the DAPK gene (0%). 
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6. I further declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further, that these statements 
were made with the knowledge that willful false statements and the like are punishable by fine or 
imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that such 
willful false statements may jeopardize the validity of the application or any patent resulting 
therefrom. 

April 22, 2008 

Date: 

/Victor Levenson/ 

By 

Victor V. Levenson 
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GRANT PROPOSAL 
(Exhibit A) 



Blood-based Biomarkers of Breast Cancer Risk 

In this project we will develop and validate a blood-based biomarker for detection and diagnosis of atypical 
ductal hyperplasia (ADH). We will use a novel technique of methylation analysis to detect ADH-specific 
changes of methylation in cell-free circulating DNA from blood. As a result we will produce a clinically 
relevant biomarker, which can be used for screening of women at risk for breast cancer development. The 
diagnostic biomarker developed in this project will be ready for a prospective clinical trial, 

• Background. Despite efforts in fight against breast cancer, each year the disease causes nearly 40,000 
deaths in the US. Ductal carcinoma is the most common type of breast cancer, accounting for nearly 75% of all 
cases. 1 Early detection of the tumor, when it is still small and well-contained, remains the most efficient way to 
reduce cancer-related mortality. Currently detection relies on screening by mammography, which is limited by 
tissue density (100% sensitivity in fatty versus 47% - in dense breasts), stage of tumor (81% for invasive ductal 
carcinomas (IDC) versus 55% for ductal carcinomas in situ, DCIS), age, hormonal status, and other issues. 2 As 
a result, screening benefits mostly women older than 40 with late stage tumors. 3 This situation is clearly 
unacceptable and calls for development of alternative detection techniques in order to identify an earlier lesion, 
atypical ductal hyperplasia (ADH), 4, 5 and recognize women at risk for breast cancer development. The 
diagnostic criteria of ADH are imperfect, and rely on the absence of certain features of DCIS, 6 size of the 
lesion, and histologic and cytologic criteria. 7 Nonetheless, association of ADH with greater risk of invasive 
ductal carcinoma is firmly established, 8 " 10 so identification of ADH in a screening assay will be clinically 
significant. In this project we will develop a biomarker for detection and diagnosis of ADH. 

Biomarkers. A screening test has to measure tumor-related factors in an observer-independent assay. 
Detection based on such biomarkers is not prone to errors from poorly controlled circumstances, and allows 
standardization of the test procedure. 

► An observer-independent biomarker can eliminate many problems of a screening test associated with 
variability of subjective readout; such a biomarker will improve accuracy of disease detection. 

Ideal biomarker. There are several well-established principles for biomarker development. An ideal 
biomarker should be specific for a particular 
disease and undetectable under physiological 
conditions; the specimen for the assay 
should be collected in a minimally invasive 
and inexpensive manner; the assay should be 
easy, reproducible, rapid, and inexpensive, 11 
and dynamic changes of biomarker should 
reflect disease progression (Fig. 1). Known 
genetic markers (e.g. mutations) do not 
reflect variability at either cellular or clinical 
level and remain constant (no dynamic 
range). Protein marker levels are tightly 
linked to functional activity of the cell, and 
can widely fluctuate (dynamic range unrelated to disease). As a result, genetic markers can predict risk but will 
not detect changes specific for the onset of cancer, while protein levels change due to unrelated factors, either 
masking the disease or producing false-positive results. 

► The ideal biomarker should reflect functional change (e.g., from low to high risk or from high risk but 
disease-free status to overt but early disease) but should not be influenced by unrelated events. 

Correlative and mechanistic biomarkers. Any feature that is linked to a disease can be used as a 
biomarker. A simple association with a clinical symptom is sufficient to use a feature as a biomarker even when 
its basis is unknown (correlative biomarkers; e.g. fever as a classic biomarker of infection). Alternatively, 
biomarkers can emerge from analysis of disease mechanisms (mechanistic biomarkers). Once the natural history 
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Fig. 1 . Different biomarkers and their role in evaluation of tumor growth. 
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of the disease is known, the possibilities are well-defined, and a search for mechanistic biomarkers can be very 
efficient. In complex diseases such as cancer poorly understood process of tumor initiation does not allow 
targeted testing and initially requires correlative biomarkers for risk assessment and early detection. 

At first correlative biomarkers may have no obvious links to the known pathology of a disease: 
amplification of a certain DNA fragment in drug resistant cells 12 was initially established as a correlative 
marker, and the mechanistic value of this correlation was revealed later (discovery of multidrug resistance gene 
MDR1 in the amplified fragment 13 ). Similarly, Ki-67 antibodies were developed and their value for tumor 
assessment was established before even the composition of the Ki-67 antigen was known; 14 it is revealing that 
Ki-67 function even now is enigmatic. 14 A systematic search for disease-specific correlative biomarkers 
requires comparison of multiple elements in disease and healthy tissues; further analysis of these biomarkers 
can reveal previously unsuspected features and new disease mechanisms, which can result in new treatments. 

>For complex diseases with poorly understood pathogenesis there is insufficient information for 
development of mechanistic biomarkers. In contrast, correlative biomarkers satisfy the immediate medical 
needs, and provide the groundwork for mechanistic understanding of the disease. 

Biomarkers for breast cancer detection and risk assessment. While multiple biomarkers exist for breast 
cancer monitoring, 15 biomarkers for early detection remain a challenge. HER2/neu amplification has been 
suggested for risk assessment in asymptomatic women, 16 ' 17 but the assay requires tissue biopsy, which will 
make it unsuitable for many women. Analysis of biomarkers in nipple aspirate fluid or in ductal lavage is too 
labor-intensive, expensive, and unreliable, 18 leaving blood as the medium of choice. Tumor cells, 19 tumor- 
specific antigens and autoantigens, 20 tumor RNA and DNA 21,22 can be recovered from blood, and have been 
tried as breast cancer biomarkers (lipids, 23 polyamines, 24 proteins, 25 RNA, 21 and DNA 22 ), although none has yet 
emerged as a biomarker for breast cancer. 

DNA niethylation as a biomarker. DNA based biomarkers have a significant advantage because DNA can 
be amplified by PCR. Epigenetic change (DNA methylation) is a stable modification, which is linked to gene 
expression and reflects the cell's function. 26 Methylation is precisely located in cytosines within the cytosine- 
guanosine dinucleotides (CpG) and CpG islands of gene promoters, 27 and the first methylation-based biomarker 
for breast cancer was recently reported (unfortunately, its accuracy was low even for invasive cancer) 28 

► Although potential of DNA methylation for development of breast cancer biomarkers is recognized, there 
are no biomarkers even for early detection. No biomarkers for risk assessment are available. 

Composite methylation biomarker. A search for methylation biomarkers would be uncomplicated if a 
certain region was always methylated in breast cancer. However, there is only a certain probability of 
methylation, and no specific pathology is invariably associated with methylation of a unique gene. This 
situation can be improved if multiple genes are analyzed to create a methylation profile. 29 These profiles for 
patients and healthy controls can then be compared to select genes with the most pronounced differences in 
methylation. Such differentially methylated genes will be combined to form a composite correlative biomarker. 
Successful selection of appropriate genes depends on their number in the initial methylation profile: the more 
genes analyzed, the better are the chances of finding highly significant differentially methylated genes, which 
will form a successful composite biomarker. This approach will be used for our project. 

► Multiple genes analyzed for methylation in each sample create the sample 's methylation profile; genes 
that are consistently selected as differentially methylated in methylation profiles of healthy individuals and 
patients with ADHwill generate a composite biomarker for ADH. 

Techniques for methylation detection can be chemical or biological: bisulfite chemically converts 
unmethylated cytosines into uracils, leaving methylated cytosines intact; 30 alternatively, DNA can be treated 
with methylation-sensitive restriction enzymes, which digest only unmethylated but not methylated DNA. 31 
Bisulfite conversion is a harsh treatment, which destroys 85-95% of input DNA, 32 while milder procedures lead 
to incomplete conversion and ambiguous results. 33 Bisulfite modification is combined with different detection 
techniques, e.g. sequencing, methylation-specific PCR 34 and its derivatives. All methods except sequencing 
have limited resolution and evaluate methylation only in some cytosines in each fragment. In addition, in a 



2 



heterogeneous clinical sample only some DNA fragments of each kind may be methylated, and positive 
methylation readout tends to ignore similar but unmethylated fragments. To avoid this error, additional testing 
is required, so two different reactions have to be performed, one for methylated target and the other for 
unmethylated. 34 If the sample is indeed heterogeneous and both methylated and unmethylated sequences are 
present, the overall result can be ambiguous. 34 

Alternatively, detection of methylation can use different activity of certain restriction enzymes on 
methylated and unmethylated DNA. Such enzymes leave DNA undigested if a cytosine within the recognition 
site is methylated, so only non-methylated templates are destroyed while methylated templates are preserved 
and can be detected. 35 If all restriction sites within the region are methylated, the whole region is scored as 
methylated; a single unmethylated site will lead to destruction of the template's integrity, and the region will be 
scored as unmethylated. To make comparison between different fragments possible, a similar number of sites in 
each fragment is analyzed. In homogeneous systems, detection of methylation is relatively simple. 36 In 
heterogeneous clinical samples the threshold to define methylated and unmethylated fragments is established to 
standardize the readout (see Study Design). 

► Bisulfite methods of DNA methylation analysis cause degradation of the major part of the sample, depend 
on chemical changes in DNA sequence, and may produce ambiguous results. In contrast, methylation-sensitive 
restriction enzyme-based techniques do not inflict DNA damage or changes to DNA sequence, and are designed 
to produce unambiguous data. 

Special features of clinical samples. Cinical samples are always limited and heterogeneous, so differences 
in analytical techniques are especially important. Bisulfite treatment destroys over 80% of input DNA, 32 
triggering concerns about specific sequences that are preferentially destroyed. Even if there is no preference, 
limited amount of DNA from a clinical sample is further diminished by bisulfite treatment. 

The bulk of cytosines is located outside of CpG dinucleotides, so they are not methylated. All of them are 
converted to uracils by bisulfite, which causes major changes in DNA sequence. As a result, unanticipated 
effects can emerge, e.g. differences in PCR efficiency for sense and antisense strands of DNA. 37 

Some bisulfite-based techniques allow quantitative measurement of methylation at specific sites, which is 
important for homogeneous samples but becomes a liability for heterogeneous samples. For example, 
methylation of stratifm increases in ductal carcinoma cells with tumor development, but in stromal cell the same 
gene is completely methylated in normal breast and in tumor. 38 As a result, the same tumor sample will give 
different results depending on the number of stromal cells in the test sample. Restriction enzyme-based 
techniques avoid quantitative assessment, so differences in stromal cells will not affect the readout, and stratifm 
will be scored as "methylated", 

► DNA destruction in bisulfite reaction is a concern for clinical samples, which are limited and 
heterogeneous (a specific fraction of DNA may be more sensitive to degradation). Quantitative analysis of 
methylation by bisulfite-based techniques may also complicate the interpretation because specimen 
heterogeneity rather than disease-related changes may be responsible for quantitative differences. Restriction 
enzyme-based techniques are not prone to these problems. 

Plasma as the source of tumor-specific DNA has been used to analyze tumor-specific mutations (e.g. ) 
and methylation (e.g. 28 ). Plasma DNA is found in healthy subjects; 40 in cancer patients its concentration is much 
higher. 41 Considering that risk assessment will target asymptomatic women, our assay has been intentionally 
developed for use with plasma, which will allow repeated sampling and will not cause undue discomfort 
associated with tissue biopsy. It is important to emphasize that cell-free circulating DNA in plasma should be 
considered a heterogeneous clinical sample raising all the concerns mentioned in the previous section. 

• Tumor-specific DNA from plasma allows analysis of tumor-specific methylation in samples collected with 
a minimally invasive procedure. Cell-free DNA from plasma should be considered a heterogeneous clinical 
sample. 

Assay for methylation detection has been developed in our laboratory 36 and successfully used with clinical 
samples 42 . Methylation status of 56 genes (a profile) is determined for each sample (Fig. 2A). The profiles are 
processed to find differentially methylated genes in control and disease samples (Fig. 2B), and these genes form 
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Fig. 2. Assay for development of composite DNA methylation biomarkers 



Table 2. Biomarker genes in tissues. 
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Table 5. Differences in methylation 
detected in plasma DNA. 
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brain cancer, and plasma DNA from ovarian, lung, pancreatic, and prostate 
cancer were analyzed. For breast cancer (tissues) the accuracy of cancer 
detection ranges between 70 and 95% (Table 1), and is the best for ADH 
lesions (set 3), where core biopsies were analyzed. Heterogeneity of the tissue 
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and their methylation in plasma of 
DCIS patients and hea thy controls. 
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Table 4. DCIS detection in 
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many differences in methylation that are 
revealed in a more homogeneous material 
from core biopsies (ADH samples). 
Indeed, the composite biomarker for 
DCIS and IDC has 10 genes, while the 
biomarker for ADH has 9 additional genes 
(Table 2). The set of genes is almost 
identical for DCIS and IDC biomarkers, 
and genes for ADH are present in the 
profiles for DCIS and IDC. These 
similarities indicate that the biomarkers 
identify common biological features of 

breast cancer and can recognize it at the stage of ADH. 

Tissue biomarkers have less clinical value than biomarkers from blood, so 
we compared profiles of plasma DNA for patients diagnosed with DCIS and 
healthy controls. Ten genes showed significant differences in methylation 
(Table 3) and were selected for the composite biomarker. We tested this 
biomarker with 25 rounds of 5-fold cross-validation and registered sensitivity 
and specificity of DCIS detection (statistical procedures are described in the 
Study Design). Results indicated that the biomarker identified DCIS with 84% 
sensitivity and 80% specificity (intersections of predicted and actual sample 
status) with type II error at 16% and type I error at 20% (Table 4). 

For this proposal we compared methylation profiles of cell-free circulating 
DNA from ADH patients and healthy women in a small set of samples (Table 
5; eight samples per group). While this set is by no means conclusive, it shows 
that differences in DNA methylation can be detected in plasma of ADH cases 
and healthy controls. Our previous experience indicates that genes with a 20% 
methylation difference in the trial set almost always become components of the 
composite biomarker, suggesting that 1 8 genes can contribute to the ADH 
biomarker. 
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• Hypothesis and objective. 

DNA methylation is different in normal and cancerous tissues; cancer-specific DNA can be detected in 
bloodstream (cell-free plasma DNA); a composite biomarker has been developed for tissue-based breast cancer 
detection; a similar biomarker has high sensitivity and specificity for detection of DCIS using cell-free plasma 
DNA; multiple differences have been detected in methylation of plasma DNA from patients with ADH and 
healthy controls. Based on these findings we hypothesize that a composite biomarker based on methylation 
profiling of cell-free plasma DNA can be developed for patients with ADH. Our objective is development and 
validation of such biomarker. 

• Specific Aims. 

♦ Aim 1: to develop a methylation-based composite biomarker for detection of atypical ductal hyperplasia. 

We will analyze methylation of 56 genes in each sample of circulating plasma DNA of three groups of 
subjects: women with ADH found in biopsy; women whose biopsy results revealed only benign disease; healthy 
women. All groups will have a similar age distribution for participants; there will be 35 independent samples 
(collected from 35 different patients) in each group. We will then identify genes differentially methylated in 
ADH patients and control groups and select informative genes for the biomarker. The sensitivity and the 
specificity of the biomarker will be determined using 25 rounds of 5 -fold-cross validation. 

♦ Aim 2: to validate performance of this composite biomarker for detection of atypical ductal hyperplasia using 
blinded specimens. 

The composite biomarker for ADH developed in Aim 1 will be tested in a separate blinded set of 60 
independent specimens. The set will contain 20 samples from each of the three groups, and the accuracy 
(specificity and sensitivity) of ADH detection will be determined. 

♦ Aim 3: to analyze performance of this composite detection biomarker for diagnosis of atypical ductal 
hyperplasia using a separate set of blinded specimens collected at a breast clinic from 100 women. About 25 of 
these specimens are expected to have ADH. 

We will test the composite detection biomarker developed in Aim 1, using 100 blinded specimens collected 
at a breast clinic. These specimens will contain samples from healthy women and women with different types of 
cancer and benign breast disease including microcalcifications, so the diagnostic potential of the developed 
biomarker will be established. 

► This approach will determine (Aim 1) and validate the accuracy of ADH detection (Aim 2), and the 
accuracy of ADH diagnosis in a set of samples representing different diseases of the breast (Aim 3). 

• Study Design. 

Overview, In this study we will design and verify the composite biomarker using the training group of 105 
open-label samples. The developed biomarker will be validated in a blinded group of 60 specimens of ADH and 
controls to assess its performance for ADH detection. Then the biomarker' s performance will be evaluated in a 
blinded group of 100 samples from women with different diseases of the breast to evaluate its performance as a 
diagnostic biomarker (expected frequency of ADH in women with microcalcifications is 31%). 43 This strategy 
follows the standards of biomarker development, 11 and emphasizes ADH detection and diagnosis as the primary 
goal. Evaluation of the biomarker with blinded samples completes phases 2 and 3 of biomarker development. 

Patients. Plasma specimens will be collected from women with small lesions (mammography - less than 
2 mm 44 ) prior to biopsy at OSF Saint Anthony Center for Cancer Care, Rockford, IL. Samples are then stratified 
as ADH, DCIS, LCIS, IDC or benign based on the pathology report of the excised lesion. This report is then 
confirmed by Dr. Barbara Susnik (Northwestern University, Department of Pathology), who will review all 
samples for consistency. Plasma collection prior to biopsy will prevent any changes in the methylation profile 
related to anesthesia or biopsy itself. While different descriptions of ADH have been suggested 6> 10, 45 , in the 
most current definition ADH is a uniform population of regularly arranged small round, cuboidal or polygonal 
hyperchromatic cells; nuclei are evenly distributed, and only single small nucleoli are seen; mitoses, especially 
abnormal, are infrequent. 44 "Healthy 5 ' controls are defined as women without neoplastic or chronic 
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inflammatory disease. To account for age-related methylation changes all groups will have comparable age 
distribution. Control groups will contain specimens from women whose biopsy results are negative for ADH 
and contain only benign lesions, and specimens from healthy women, collected separately. There is no data for 
race-related differences, but this option will be considered in the statistical analysis. The project has been 
reviewed and approved by the Institutional Review Board (IRB) of Northwestern University and of the OSF 
Saint Anthony Center for Cancer Care. Collection protocol ensures that specimens will include plasma from 
women with LOS, DCIS, IDC, and benign breast disease including microcalcifications; in this latter group 31% 
of patients are expected to have ADH. 43 

Assay. Analytical and statistical techniques have been tested with over 300 specimens of DNA from 
tissues (breast, ovarian, and brain cancer) and plasma (breast, ovarian, prostate, lung, pancreatic, and colon 
cancer). Plasma-based biomarkers have been designed for ovarian cancer (Melnikov, submitted) and for DCIS 
(Melnikov, in preparation). To assess methylation, purified plasma DNA is divided into two parts - one is 
treated with the methylation-sensitive restriction enzyme Hin6I, and the other is untreated. Methylated 
fragments are preserved, while the unmethylated template DNA is destroyed and cannot serve as a template for 
PCR. Both digested and control DNAs are used for nested PCR with gene-specific primers; aminoallyl-dUTP is 
added for the second round of amplification. The aminoallyl groups of incorporated dUTPs are then coupled to 
reactive Cy5 or Cy3 dyes, creating fluorescently labeled products. One dye is used for PCR products from 
undigested control DNA, another - for PCR products from //w<5/-digested DNA. Both labeled products are 
hybridized to a custom-designed microarray with probes for amplified fragments. Each microarray has three 
identical subarrays to follow consistency of the signal. Fluorescence of both fluorophores in every spot is 
quantified, and the Cy5/Cy3 ratio is calculated. 

A Cy5/Cy3 ratio specific for completely methylated DNA in each spot is determined and used to define 
methylation status for each fragment. Several spots provide quality control: empty spots determine background, 
spots for unmethylated targets serve to control efficiency of digestion, and spots with probes for non-human 
DNA define nonspecific binding. 

Statistical analysis compares methylation profiles in cases and controls to select the most differentially 
methylated genes. These genes are then analyzed as a group, and the classification accuracy is established. Data 
analysis is described below. 

Methods. 

Collection of blood follows the standard procedure with EDT A- containing BD Vacutainer tubes. Tubes are 
centrifuged at 1,1 00 xg for 10 min. Supernatant is collected, placed into another tube, and centrifuged again. 
Final plasma is collected and frozen at -70°C. 

Purification and quantitation of cell-free circulating DNA , Plasma (100 (ill) is mixed with 2 X Proteinase K 
buffer (1 00 mM NaCl, 1 0 mM TrisHCl, pH8,0, 1 0 mM EDTA, 0.5% SDS) and incubated at 55°C for 6 hr with 
Proteinase K (1 mg/ml). DNA is purified with DNAzol (MRC, Cincinnati, OH): DNA is mixed with 10 vol of 
DNAzol, and DNA is precipitated by 0.5 vol of 100% ethanol, washed with 80% ethanol, and dissolved in 10 
mM Tris pH7.8, 0.5 mM EDTA. Concentration is determined using Hoechst 33248 and DyNAQuant 2000 
(Hoefer, GE Healthcare, Piscataway, NJ). 

Digestion with methylation-sensitiye restriction enzyme Hin6I (Fermentas, Hanover, MD; recognition site 
GCGC) is done in 100 |ul at 37°C. Two ng of DNA is incubated with 40 U of the enzyme. The second aliquot is 
incubated without the enzyme, processed side-by-side with digested DNA as a control, and only fragments with 
a signal from control DNA are scored. 

PCR amplification: 400 pg of digested and control DNAs are used for the first round of nested PCR. 
KlenTaql (DNA PolTech, St Louis, MO) is used at 20 U per 50 ^1 reaction. PCR is done for 25 cycles (95°C; 
45sec - 62°C; lmin - 72°C; lmin). QIAquick PCR Purification Kit (Qiagen, Valencia, CA) is used to purify 
PCR products. For the second PCR 400 pg of combined products are used; PCR contains a mix of aminoallyl- 



dUTP (Biotium, Hayward, CA) and dTTP (3:1), and is done as before for 20 cycles. PCR products are purified 
with QIAquick PCR Purification Kit, eluted and combined. 

Coupling aminoallvl-labeled PCR products to Cy dyes. Purified products of the second PCR are dried and 
dissolved in 5 |utl of 200 mM NaHCCh (pH 9.0). Cy3 or Cy5 in DMSO are added and mixed. Labeling continues 
for 2 hrs at room temperature in the dark, and the labeled products precipitated with ethanol. 

Development of the array . All the genes for the 
array (Table 6) have been shown to be methylated in 
breast cancer. Oligonucleotide arrays (with 50-60 mer 
probes) are custom designed by Microarrays, Inc 
(Nashville, TN). Three sets of control probes are 
present - transcribed regions from A. thaliana 
(definitive negative control, heterologous); transcribed 
regions of human a-tubulin, p-actin and GAPDH 
(definitive negative controls, homologous); promoters 
of p-actin, phosphogly cerate kinase and ribosomal 
protein LI 5 (conditional homologous negative 
control). Oligonucleotides contain an amino group and 
a six-carbon spacer at the 5 '-end. They are spotted on 
aminosilane-modified glass slide in triplicate, so each 
slide contains three identical subarrays. 

Hybridization. The slides are pre-hybridized in rotating tubes at 42°C in 5xSSC, 0.1% SDS, 1% BSA, 
denatured DNA is added, the coverslip is sealed, and the slides are hybridized for 1 8 hr at 42°C. After 
hybridization the slides are washed at 42°C in lxSSC, 0.1%SDS and in O.lxSSC, 0.1%SDS, and dried. 

Detection of the signal ScanArrav™ 4000XL (Perkin-Elmer, Wellesley, MA) is used to record the signal. 
Signal is acquired by the EasyScan using the Adaptive Circle algorithm. 

Normalization of the signal is designed for a 'directed array', in which all Cy5/Cy3 ratios should be greater 
than or equal to 1 . Complete methylation of a fragment should produce a Cy5/Cy3 ratio of 1 if fluorophors' 
performance is identical. Since fluorophores are different, a coefficient for normalization of the Cy5/Cy3 ratios 

is established using a "self-self technique (Fig. 3); 47 a 
single PCR product is separated into two parts, each is 
labeled with a different fluorophor, mixed, and used 
for hybridization. In this mixture Cy5- and Cy3- 
labeled fragments are equally represented imitating 
completely methylated fragments for each spot, so 
fluorophor-related discrepancy in detection can be 
adjusted with "self-self '-derived coefficients (Standard Methylation Calls or SMCs). 

Statistical analysis . The first step involves non-specific filtering to remove unreliable data and retain only 
informative genes. "Unreliable" is determined as a gene where signal from undigested DNA is less than 3 X 
background. "Informative gene" is determined as a gene that gives interpretable methylation call in at least 80% 
of all samples. After filtering, Cy5/Cy3 ratio for completely methylated DNA (SMC) is used to dichotomize 
data for each gene into "methylated" (M; ratio < SMC), and "unmethylated" (UM; ratio > 1.1 times SMC); data 
between these thresholds is scored as "unassigned" (UA; SMC < ratio < 1.1 times SMC) and removed. Potential 
components of the biomarker are then selected by the chi-square test of association and Fisher's Exact Test with 
a 0.10 significance level. P- value correction for multiple comparisons is not essential, since differential 
methylation of individual genes is just an intermediate result. Significance at a nominal alpha level of 10% 
effectively serves as a filter for informative gene selection. The naive Bayes algorithm for pattern recognition 



Table 6. Genes for analysis of methylation. 
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Fig. 3. Experimental approach to determine methylation threshold. 
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tasks is used to identify genes for the composite biomarker. Misclassification errors are determined by twenty 
five rounds of stratified five-fold cross-validation. 49 For each iteration the data are randomly divided into five 
groups with four groups assigned to a learning set, and the fifth group left as a testing set. The naive Bayes 
algorithm is applied to the learning set to choose contributing genes and establish classification parameters, 
which are then used with the test set to determine its accuracy. The process is repeated five times so that each 
group is used as the test set once. The sets are then combined, and another iteration begins. Previous experience 
indicates that at least 5-10 genes will be found as differentially methylated in each cross-validation step. The 
final biomarker will contain genes that are consistently chosen in the cross-validation steps. The biomarker will 
be considered successful if cross-validation achieves at least 85% sensitivity and at least 80% specificity for 
ADH. Several alternate classification methods will be assessed to determine the robustness of the conclusions to 
the choice of statistical algorithms. Logistic regression and partial least squares will be employed as the 
alternate classification methods. 50 

To detect potential confounding effects of clinical variables (e.g. age, race), we will use both supervised and 
unsupervised methods. We will perform standard unsupervised hierarchical clustering 51 to explore potential 
subgroups within the patient set and examine the relationship of such subgroups to relevant clinical factors. The 
supervised analysis will be based on the classification methods described above (e.g. Naive Bayes) with clinical 
factors added as potential classifiers. Both main effects (effects of clinical factors on disease status) and 
interaction effects (different disease/methylation relationship within different clinical subgroups) will be 
explored. 

Specimens . Three sets of samples will be analyzed - a learning set to develop the biomarker as described 
above (35 open-label samples from ADH patients, 35 samples from healthy controls and 35 - from controls 
with benign diseases); the first validation set to assess the biomarker performance for ADH detection (20 
samples from patients with ADH, 20 healthy and 20 benign controls), and the second validation set to assess the 
biomarker performance for ADH diagnosis (100 blinded samples from women diagnosed with different breast 
diseases). All validation samples will be blinded by the principle investigator before submitting the data for 
biostatistical analysis. 

The sample size calculations are made with the following assumptions: 

1) Aim 1 . For biomarker selection, we assume markers will be methylated at a rate of approximately 50% 
in cases and 25% in controls, and an initial biomarker filter of p<0.1 (two sided Fisher's Exact). A 
sample size of 35 per group will give 70% power on an individual marker basis to detect a difference in 
the methylation rate. Based on our pilot data, this power should be sufficient to identify enough 
candidate biomarkers for development of the classification algorithm. 

2) Aim 2, First Validation set. In order to demonstrate that the sensitivity and specificity is significantly 
superior to 50% (two sided Fisher's exact p-value <0.05), a sample size of 40 controls and 20 cases will 
give 90% power under the assumption of 80% specificity and 85% sensitivity. 

3) Aim 3, Second validation set. Since we do not know a priori what the case/control distribution in the 
second validation cohort will be, the power analysis is based on overall accuracy. Assuming that the 
overall diagnostic accuracy is 80%, a sample size of 100 subjects will be required to have the 95% 
confidence interval for overall accuracy to have a width of +- 7.5% 

Evaluation of results. Progress of the project will be evaluated by the selection of genes for the biomarker, 
and by the misclassification rates after cross-validation (see Statistical Analysis). Based on pilot data, we expect 
that the test will have 85% sensitivity and 80% specificity. The project will be evaluated at each of the three 
stages (development, validation of detection, validation of diagnosis) using the same criteria. Accuracy of 
detection and diagnosis by the methylation assay will be determined as the match to the pathology report. Even 
in case when the accuracy is below projected, the results will still be an important step indicating that the 
methylation-based diagnostic assay is possible if adjustments to the approach are made (see below). 

Two independent validation sets will be used to confirm the sensitivity and specificity of the biomarker. Our 
data show sensitivity of 87.5% and specificity of 94.99% for ADH detection in tissues (Table 2); results of 
DCIS detection in plasma (Table 4) indicate that 85% sensitivity and 80% specificity are realistic. 
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The first validation set contains 20 ADH and 40 control samples; this validation will assess the biomarker 
performance for ADH detection. In the second validation set the fraction of ADH will not be known a priori, 
and the set will contain different types of breast pathology. This set will provide a realistic assessment of the 
power of the methylation biomarker to diagnose ADH, i.e. to identify ADH amongst a number of different 
breast diseases. 

Validation of methylation results is done by bisulfite sequencing for selected genes in five randomly chosen 
specimens from each group. Bisulfite sequencing is done as before with several modifications for limited 
amounts of starting template. DNA is purified, 52 and treated with 9M bisulfite for 40 min at 70°C to reduce 
DNA degradation. PCR primers for selected regions are designed with MethPrimer. 54 Amplified regions are 
cloned into pGEM-T-Easy and 20 individual clones are sequenced from both ends using Ml 3 sequencing 
primers. 36 

► All techniques for this assay have been tested with multiple specimens of DNA from tissues, and from 
plasma of cancer patients and healthy controls. The assay has been extensively controlled and validated, so no 
significant problems are expected. 

Potential problems and solutions. 

Technical problems. Difficulties in morphological discrimination between DCIS and ADH may result in 
different assessment of the same sample by different pathologists; such samples will be considered as borderline 
DCIS and replaced. The analytical space of 56 genes may not be sufficient to find genes for stage-specific 
diagnosis, and profiles for ADH and DCIS may be substantially similar; the accuracy of the biomarker will then 
be determined in comparison of "disease" vs. control groups, and appropriate restrictions will be described for 
the clinical use of the biomarker. 

Biological problems. ADH, DCIS, IDC may represent different morphological forms of the same disease; 
then morphological differences will not be reflected on molecular level, and molecular differentiation between 
these forms will be impossible. If that is the case, the project will define and validate the biomarker for the 
earliest possible molecular detection of breast cancer, and as such will have the same impact on clinical 
practice. Finally, profiles of benign diseases may be too similar to the ADH-DCIS-IDC profile; this result will 
indicate that benign diseases contain some elements of malignant growth. To discriminate between these 
possibilities we consider a follow-up study of women with benign diseases. 

► None of these problems will affect the course of the study, but rather will be the result of the project. 
Clinical application will not be affected because the biomarker will still identify the earliest possible stage of 
the disease. Our understanding of breast cancer will be advanced whether epigenetic differences between ADH 
and DCIS are detected or not. 

There is a possibility, which is not very likely according to the preliminary data, that the projected accuracy 
of 85% sensitivity and 80% specificity will not be achieved using the current set of genes. In that case we will 
select no more than five fragments with the most significant differences between ADH and control samples 
(assessed by the /7-value) as the basis of the composite biomarker. Additional genes will be selected from 
literature and tested in batches; the most differentially methylated genes (assessed by the /rvalue) will be added 
to the composite biomarker. 

Finally, reduced accuracy of detection may reflect heterogeneity within the ADH group - it is possible that 
two subgroups of ADH patients have similar morphological changes, and only one of them has higher risk of 
breast cancer. In this case we may expect to detect heterogeneity in molecular profiles, which will reduce 
accuracy of ADH detection. To test this possibility we will perform unsupervised hierarchical clustering as 
described above using centered Pearson correlation to measure the distance between the two groups. The 
threshold value separating the two clusters will be selected to be higher than the similarity values within each 
cluster. Biomarker testing will then be done separately for each group, and their pathological characteristics will 
be compared. At this time, it is impossible to estimate the likelihood of this situation, but we are prepared to re- 
evaluate our results should experimental data point in this direction. 

• Clinical impact. A blood-based detection of ADH will make a considerable impact on breast cancer by 
providing a minimally invasive molecular test for the earliest stage of the disease. Selection of a subset of genes 
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for testing will provide an opportunity to develop a relatively inexpensive screening assay that can be offered 
for population-wide screening. The impact will be especially significant for younger women with dense breasts, 
because in this group mammography is ineffective as a screening tool and is not recommended by the American 
Cancer Society . Identification of ADH patients as a high-risk group will allow better targeted interventions, 
including preventive regimens. Even if molecular profiles of ADH and DCIS turn out to be indistinguishable, 
the project will create a clinically appropriate biomarker for the earliest possible detection of breast cancer. By 
the end of the project the test for detection of ADH will be developed and validated. The test will allow 
repeated sampling and will be ready for validation in a prospective trial, which will involve screening of 
asymptomatic women to correlate the test's performance and results of mammography /biopsy. 

• Innovation. To our knowledge this is the first test that evaluates methylation status of 56 genes using cell- 
free DNA from 0.2 ml of plasma. This test is also the first test to achieve 84.5% sensitivity and 80% specificity 
for DCIS detection. While each step of the assay is not novel by itself, the end result has been achieved for the 
first time. The assay can be adjusted to include additional genes for improved sensitivity and specificity. A 
statistical algorithm for tumor detection has been developed specifically for this assay. The assay is intended as 
a tool for assessment of personal risk of breast cancer development, and as a method for individualized cancer 
monitoring. It is important to notice that the assay serves to identify components of the biomarker rather than to 
perform the test itself; multiple techniques can be used to analyze a limited number of genes once those 
informative for the disease have been selected. 

Collaborative Efforts and Future Directions 

This project involves collaboration between the PI (development of the assay) and the breast cancer 
oncologist (Dr. Hoskins, OSF Saint Anthony Center for Cancer Care, Rockford, IL). Once the assay is 
developed and validated, additional funds will be sought to initiate a clinical trial at this clinic. The goals of the 
trial will be to correlate performance of the developed methylation test with mammography /biopsy techniques 
in women presenting at the OSF Saint Anthony Center for Cancer Care. 

The biomarker developed in this project will be ready for clinical application, which will be done using a 
simple PCR-based detection. This shift in technology depends on the task - while the microarray-based 
technique is valuable as a tool for screening multiple genes in each sample, it is too sophisticated for routine use 
in a clinical laboratory. Moreover, its use will be unnecessary since the informative genes will already be 
identified, and a PCR-based technique will be the most appropriate clinical test for a small number of 
informative methylation markers selected in this project. 

Dissemination Plan. 

The results of the project will be published in three installments: development of the composite 
biomarker and its performance in an open set (target - Breast Cancer Research); performance of the biomarker 
with the blinded set of samples from breast cancer patients and healthy controls (target - Clinical Cancer 
Research); and performance of the biomarker in the blinded set of samples from breast care clinic (target - 
Journal of Clinical Investigation). Results will also be presented at the San Antonio Breast Cancer Symposia, 
and AACR meetings. 
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